Skip to content

fix(mcp): surface stored metadata and updated_at in detailed recall format#184

Merged
jack-arturo merged 2 commits into
developfrom
fix/111-mcp-metadata-detailed
Jun 11, 2026
Merged

fix(mcp): surface stored metadata and updated_at in detailed recall format#184
jack-arturo merged 2 commits into
developfrom
fix/111-mcp-metadata-detailed

Conversation

@jack-arturo

Copy link
Copy Markdown
Member

Summary

The REST /recall API already returns metadata, updated_at, and last_accessed — the gap was the MCP server's detailed format, which omitted them (making custom metadata effectively write-only for MCP agents, the actual complaint in #111).

  • formatRecallAsItems() detailed branch now renders an Updated: line and a size-capped single-line Metadata: JSON (300 chars + ellipsis; omitted when empty) — capped because a prior raw-dump attempt was rejected for verbosity.
  • json format was already a raw passthrough; now pinned by a transport-level test.
  • text/items formats unchanged.
  • New REST contract test (test_recall_metadata_roundtrip) locks the server-side behavior; docs/METADATA_BEHAVIOR.md corrected (it over-claimed that detailed already exposed metadata).

Testing

  • Node: 15/15 (npm test, includes truncation-boundary and empty-metadata cases)
  • Python: 488 passed, 12 skipped; black + flake8 clean

Closes #111

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 11, 2026 01:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MCP SSE server’s /recall tool formatting so the detailed output surfaces updated_at and (size-capped) metadata, aligning MCP output more closely with the existing REST /recall payload and addressing the “metadata is write-only” complaint from #111. It also adds contract-style tests and corrects documentation around metadata visibility.

Changes:

  • MCP formatRecallAsItems(..., { detailed: true }) now renders an Updated: line and a 300-char capped single-line Metadata: JSON (omitted when empty/missing).
  • Added Node tests to pin detailed metadata truncation/omission behavior and to pin format=json passthrough of metadata, updated_at, and last_accessed.
  • Added a REST roundtrip test for storing metadata via POST /memory and verifying it surfaces in GET /recall; updated docs accordingly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/test_api_endpoints.py Adds a REST contract test ensuring stored metadata + timestamps round-trip through /recall.
mcp-sse-server/test/server.test.js Extends formatting tests for detailed metadata/updated output and adds a transport-level json passthrough test.
mcp-sse-server/server.js Implements Updated: rendering and size-capped Metadata: rendering in detailed recall formatting; updates tool format description.
docs/METADATA_BEHAVIOR.md Corrects documentation to reflect that MCP detailed now renders (capped) metadata and Updated: when present.

Comment thread mcp-sse-server/server.js
jack-arturo and others added 2 commits June 11, 2026 21:05
…ormat (#111)

The REST /recall API already returns parsed memory.metadata plus
updated_at/last_accessed; the gap was the MCP server's detailed
formatter, which omitted them entirely. The detailed format now
renders:

- an Updated: line when updated_at is present (parallel to the
  existing Last accessed handling), and
- a size-capped Metadata: line — single-line JSON truncated to 300
  chars with a trailing ellipsis, omitted when metadata is missing
  or empty — so provenance fields surface without dumping raw
  metadata verbosely.

The json format remains a raw passthrough (already exposed metadata)
and is now locked by a transport-level test. text/items formats are
unchanged.

Adds a REST contract test (test_recall_metadata_roundtrip) locking
the store -> recall metadata/timestamp round-trip, and fixes
docs/METADATA_BEHAVIOR.md, which over-claimed that the detailed
format already exposed metadata.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@jack-arturo jack-arturo force-pushed the fix/111-mcp-metadata-detailed branch from a4f8c57 to 85e5eb0 Compare June 11, 2026 19:06
@jack-arturo jack-arturo changed the base branch from main to develop June 11, 2026 19:06
@jack-arturo jack-arturo merged commit 230416e into develop Jun 11, 2026
5 checks passed
@jack-arturo jack-arturo deleted the fix/111-mcp-metadata-detailed branch June 11, 2026 19:07
jack-arturo added a commit that referenced this pull request Jun 12, 2026
…nce gate, date-aware ranking (#182, #193, #186, #187, #183, #184, #188) (#194)

## Release: ranking & recall series (develop → main)

⚠️ **Merge with a MERGE COMMIT — do not squash.** release-please needs
the individual conventional commits below to compute the version and
changelog for PR #154.

### What's in this release

| PR | Change | Default behavior |
|---|---|---|
| #182 | `feat(recall)`: configurable recency decay window/curve |
unchanged (env-gated) |
| #193 (replaces #185) | `feat(recall)`: tag-score denominator cap fixes
query-length bias | unchanged (`SEARCH_TAG_SCORE_TOKEN_CAP=0`) |
| #186 | `fix(recall)`: relevance gate — query-independent scoring gated
on topical evidence (#130) | unchanged (gate off) |
| #187 | `feat(recall)`: date-aware ranking,
`recency_bias=off\|on\|auto`, latest-fact selection (#158, #159) |
`RECALL_RECENCY_BIAS=off`; adds deterministic timestamp tiebreak for
near-ties |
| #183 | `feat(benchmarks)`: failure-mode diagnosis harness + judge
quota preflight | tooling only |
| #184 | `fix(mcp)`: surface stored metadata + `updated_at` in detailed
recall format (#111) | additive |
| #188 | `feat(enrichment)`: classification fallback-rate metrics in
`/enrichment/status` | additive |

Plus: CI now runs on `develop` pushes/PRs; benchmark experiment log +
README contribution-policy note.

### Verification evidence

- **Unit/lint/npm**: 625 pytest + 16 mcp-sse-server tests green on
develop head; CI green.
- **Default-preserve**: recall-lab baseline on the 10k-memory production
snapshot — develop defaults vs main pooled baseline identical aggregates
(R@5 0.655 / R@10 0.710 / MRR 0.434 / NDCG@10 0.501). Two-stack probe
run (main vs develop, defaults): 11/12 preserve-exact, remaining diffs
are near-tie reorders (top-1 score deltas ≤ 5.4e-5, the #187 timestamp
tiebreak).
- **Full judged 500q LongMemEval** (ship config:
`RECALL_RECENCY_BIAS=auto` + `temporal-answer` harness): recall@5 96.6%
(483/500), accuracy 86.0% (430/500), `judge_errors=0`,
`memory_ingest_failures=0`.
- **Churn attribution** (targeted re-runs of all 17 churned questions on
current-main-at-defaults and develop-at-defaults): 15/17 moved with #191
(already on main) — the April canonical 97.2% floor is stale; current
main measures ~97.0%. Develop-at-defaults differs from current main by
**1 question in 500** (a near-tie rank-5/6 flip from #187's
deterministic tiebreak). Accuracy is within answerer replicate noise
(identical-config reference runs flip 28/500 answers).
- Full detail: `benchmarks/EXPERIMENT_LOG.md` (2026-06-11 entry) and
`benchmarks/results/lme_churn17_*` + `analyze_churn17.py`.

### Opt-in features shipped OFF

`RECALL_RELEVANCE_GATE` (validated at 0.40 on lab corpus; improves
negative-probe precision) and `RECALL_RECENCY_BIAS=auto` (current-state
query re-ranking). Neither affects default behavior; see
`docs/ENVIRONMENT_VARIABLES.md`.

### After merging

release-please will update PR #154 (v0.16.0); merging *that* cuts the
tag and publishes the `:stable` image — the actual user-facing deploy
event for Railway template users.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Return metadata fields in recall responses by default

2 participants